Sampling distributions, null hypotheses, etc.
Estimation and hypotheses (QMLS 1, Unit 7)
d <- DD |> group_by(Temperature) |>
summarize(xbar = mean(growthR)) |>
pivot_wider(names_from = Temperature, values_from = xbar) |>
mutate(d = High - Low) |>
pull(d)
nreps <- 1e4
diffs.e <- numeric(length = nreps)
diffs.e[1] <- d
for (ii in 2:nreps) {
Rand_G <- sample(DD$Temperature)
diffs.e[ii] <- mean(DD$growthR[Rand_G == "High"]) -
mean(DD$growthR[Rand_G == "Low"])
}
pe <- ggplot(data.frame(diffs.e), aes(diffs.e)) +
geom_histogram(binwidth = 0.1, fill = "gray75") +
geom_segment(x = d, xend = d,
y = 0, yend = Inf,
linewidth = 2,
color = "firebrick4") +
ylim(c(0, 1500)) +
xlim(c(-1.2, 1.2)) +
labs(x = "Difference (High - Low)", y = "Count")
pediffs.e vector.
Temperature column
mu_both <- mean(c(muL, muH))
nreps <- 1e4
diffs.s <- numeric(length = nreps)
for (ii in 1:nreps) {
diffs.s[ii] <- mean(rnorm(n1, mu_both, sd1)) -
mean(rnorm(n2, mu_both, sd2))
}
ps <- ggplot(data.frame(diffs.s), aes(diffs.s)) +
geom_histogram(binwidth = 0.1, fill = "gray75") +
geom_segment(x = d, xend = d,
y = 0, yend = Inf,
linewidth = 2,
color = "firebrick4") +
ylim(c(0, 1500)) +
xlim(c(-1.2, 1.2)) +
labs(x = "Difference (High - Low)", y = "Count")
ps\[t = \frac{\bar{y}_1 - \bar{y}_2}{s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}}\]
Pooled sample standard deviation:
\[s_p = \sqrt{\frac{(n_1 - 1)s_1^2 + (n_2 - 1)s_2^2}{n_1 + n_2 - 2}}\]
\[s_p = \sqrt{\frac{(20 - 1) \cdot 1 + (20 - 1) \cdot 1}{20 + 20 - 2}} = \sqrt{\frac{19 + 19}{38}} = 1\]
\[t = \frac{\bar{y}_1 - \bar{y}_2}{1 \cdot \sqrt{\frac{1}{20} + \frac{1}{20}}}\]
\[t \cdot \sqrt{\frac{1}{20} + \frac{1}{20}} = \bar{y}_1 - \bar{y}_2\]
sp <- sqrt(((n1 - 1) * sd1^2 + (n2 - 1) * sd2^2) /
(n1 + n2 - 2))
scale_t <- sp * sqrt(1 / n1 + 1 / n2)
diffs.a <- scale_t * rt(nreps, df = n1 + n2 - 2)
pa <- ggplot(data.frame(diffs.a), aes(diffs.a)) +
geom_histogram(binwidth = 0.1, fill = "gray75") +
geom_segment(x = d, xend = d,
y = 0, yend = Inf,
linewidth = 2,
color = "firebrick4") +
ylim(c(0, 1500)) +
xlim(c(-1.2, 1.2)) +
labs(x = "Difference (High - Low)", y = "Count")
pasp
scale_t.
Many (but not all) Monte Carlo approaches will be different ways to get an empirical or simulated null distribution
Consider our jackals data set again